648 research outputs found
TEI and LMF crosswalks
The present paper explores various arguments in favour of making the Text
Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO
standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the
issues that would have to be resolved in order to reach an appropriate
implementation of these ideas, in particular in terms of infor-mational
coverage. We show how the customisation facilities offered by the TEI
guidelines can provide an adequate background, not only to cover missing
components within the current Dictionary chapter of the TEI guidelines, but
also to allow specific lexical projects to deal with local constraints. We
expect this proposal to be a basis for a future ISO project in the context of
the on going revision of LMF
Stabilizing knowledge through standards - A perspective for the humanities
It is usual to consider that standards generate mixed feelings among
scientists. They are often seen as not really reflecting the state of the art
in a given domain and a hindrance to scientific creativity. Still, scientists
should theoretically be at the best place to bring their expertise into
standard developments, being even more neutral on issues that may typically be
related to competing industrial interests. Even if it could be thought of as
even more complex to think about developping standards in the humanities, we
will show how this can be made feasible through the experience gained both
within the Text Encoding Initiative consortium and the International
Organisation for Standardisation. By taking the specific case of lexical
resources, we will try to show how this brings about new ideas for designing
future research infrastructures in the human and social sciences
Encoding models for scholarly literature
We examine the issue of digital formats for document encoding, archiving and
publishing, through the specific example of "born-digital" scholarly journal
articles. We will begin by looking at the traditional workflow of journal
editing and publication, and how these practices have made the transition into
the online domain. We will examine the range of different file formats in which
electronic articles are currently stored and published. We will argue strongly
that, despite the prevalence of binary and proprietary formats such as PDF and
MS Word, XML is a far superior encoding choice for journal articles. Next, we
look at the range of XML document structures (DTDs, Schemas) which are in
common use for encoding journal articles, and consider some of their strengths
and weaknesses. We will suggest that, despite the existence of specialized
schemas intended specifically for journal articles (such as NLM), and more
broadly-used publication-oriented schemas such as DocBook, there are strong
arguments in favour of developing a subset or customization of the Text
Encoding Initiative (TEI) schema for the purpose of journal-article encoding;
TEI is already in use in a number of journal publication projects, and the
scale and precision of the TEI tagset makes it particularly appropriate for
encoding scholarly articles. We will outline the document structure of a
TEI-encoded journal article, and look in detail at suggested markup patterns
for specific features of journal articles
Multiple Retrieval Models and Regression Models for Prior Art Search
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3. The exploitation of patent metadata and of the
citation structures for creating restricted initial working sets of patents and
for producing a final re-ranking regression model. As we exploit specific
metadata of the patent documents and the citation relations only at the
creation of initial working sets and during the final post ranking step, our
architecture remains generic and easy to extend
Data fluidity in DARIAH -- pushing the agenda forward
This paper provides both an update concerning the setting up of the European
DARIAH infrastructure and a series of strong action lines related to the
development of a data centred strategy for the humanities in the coming years.
In particular we tackle various aspect of data management: data hosting, the
setting up of a DARIAH seal of approval, the establishment of a charter between
cultural heritage institutions and scholars and finally a specific view on
certification mechanisms for data
Scholarly Communication
The chapter tackles the role of scholarly publication in the research process
(quality, preservation) and looks at the consequences of new information
technologies in the organization of the scholarly communication ecology. It
will then show how new technologies have had an impact on the scholarly
communication process and made it depart from the traditional publishing
environment. Developments will address new editorial processes, dissemination
of new content and services, as well as the development of publication
archives. This last aspect will be covered on all levels (open access,
scientific, technical and legal aspects). A view on the possible evolutions of
the scientific publishing environment will be provided.Comment: To appear in Mehler, Romary, Gibbon (eds), Technical Communication,
M. de Gruyter, Berlin (2011
- …